Getting Started with RNA Velocity¶

RNA velocity is based on bridging measurements to a underlying mechanism, mRNA splicing, with two modes indicating the current and future state [RNA velocity—current challenges and future perspectives]. It is a method used to predict the future gene expression of a cell based on the measurement of both spliced and unspliced transcripts of mRNA [2].

RNA velocity could be used to infer the direction of gene expression changes in single-cell RNA sequencing (scRNA-seq) data. It provides insights into the future state of individual cells by using the abundance of unspliced to spliced RNA transcripts. This ratio can indicate the transcriptional dynamics and potential fate of a cell, such as whether it is transitioning from one cell type to another or undergoing differentiation [RNA velocity of single cells].

  • velocyto

Velocyto is a package for the analysis of expression dynamics in single cell RNA seq data. In particular, it enables estimations of RNA velocities of single cells by distinguishing unspliced and spliced mRNAs in standard single-cell RNA sequencing protocols. It is the first paper proposed the concept of RNA velocity. velocyto predicted RNA velocity by solving the proposed differential equations for each geneRNA velocity of single cells].

  • scVelo

scVelo is a method that solves the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to systems with transient cell states, which are common in development and in response to perturbations. scVelo was applied to disentangling subpopulation kinetics in neurogenesis and pancreatic endocrinogenesis. scVelo demonstrate the capabilities of the dynamical model on various cell lineages in hippocampal dentate gyrus neurogenesis and pancreatic endocrinogenesis [Generalizing RNA velocity to transient cell states through dynamical modeling].

Here,I will go through the basics of scVelo. The following tutorials go straight into analysis of RNA velocity, latent time, driver identification and many more.

First of all, the input data for scVelo are two count matrices of pre-mature (unspliced) and mature (spliced) abundances, which can were obtained from standard sequencing protocols, using the velocyto.

In [108]:
#!pip install tqdm
#!pip install ipywidgets
In [109]:
#!pip install scvelo
#!pip install scvi-tools
In [110]:
#In this tutorial, you need to install the following version of Numpy and Pandas
#!pip install numpy==1.22
#!pip install pandas==1.1.5
In [111]:
import scanpy as sc
import pandas as pd
#import cellrank as cr
import scvelo as scv
import loompy
scv.logging.print_version()
Running scvelo 0.2.5 (python 3.9.12) on 2023-11-03 15:36.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
ERROR: XMLRPC request failed [code: -32500]
RuntimeError: PyPI no longer supports 'pip search' (or XML-RPC search). Please use https://pypi.org/search (via a browser) instead. See https://warehouse.pypa.io/api-reference/xml-rpc.html#deprecated-methods for more information.
In [112]:
scv.settings.verbosity = 3  # show errors(0), warnings(1), info(2), hints(3)
scv.settings.presenter_view = True  # set max width size for presenter view
scv.set_figure_params('scvelo')  # for beautified visualization
In [113]:
#ds = loompy.connect("/media/aruna/DATA/171012_SAGI_ORI_8_HUMAN_10X/C6/velocyto/C6.loom")
#loompy.combine(files, '/home/aruna/Desktop/Analysis/Project1/Project1_loomp.loom', key="Accession")
In [114]:
adata = scv.read('/home/aruna/Desktop/Analysis/Project1/Data/Project1_merged.h5ad', cache=True)
adata
Out[114]:
AnnData object with n_obs × n_vars = 46110 × 1374
    obs: 'batch', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'condition', 'cell_type'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'batch_colors', 'cell_type_colors', 'condition_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'leiden_colors', 'leiden_sizes', 'log1p', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'umap'
    obsm: 'X_pca', 'X_umap', 'ora_estimate', 'ora_pvals'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'connectivities', 'distances'
In [115]:
ldata = scv.read('/home/aruna/Desktop/Analysis/Project1/Data/Project1_loomp.loom', cache=True)
ldata
Out[115]:
AnnData object with n_obs × n_vars = 46529 × 20070
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'
In [116]:
scv.utils.clean_obs_names(adata)
scv.utils.clean_obs_names(ldata)
adata = scv.utils.merge(adata, ldata)
In [117]:
#results_file = '/home/aruna/Desktop/Analysis/Project1/Data/Project1&loom.h5ad'  
#adata.write(results_file)
In [118]:
adata
Out[118]:
AnnData object with n_obs × n_vars = 46110 × 1229
    obs: 'batch', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'condition', 'cell_type', 'sample_batch', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std', 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    uns: 'batch_colors', 'cell_type_colors', 'condition_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'leiden_colors', 'leiden_sizes', 'log1p', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'umap'
    obsm: 'X_pca', 'X_umap', 'ora_estimate', 'ora_pvals'
    varm: 'PCs'
    layers: 'counts', 'ambiguous', 'matrix', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances'
In [119]:
adata.layers
Out[119]:
Layers with keys: counts, ambiguous, matrix, spliced, unspliced
In [120]:
scv.pl.proportions(adata, groupby="cell_type")

Here, the proportions of spliced/unspliced counts are displayed. Depending on the protocol used (Drop-Seq, Smart-Seq), we typically have between 10%-25% of unspliced molecules containing intronic sequences. We also advice you to examine the variations on cluster level to verify consistency in splicing efficiency. Here, we find variations as expected, with slightly lower unspliced proportions at cycling ductal cells, then higher proportion at cell fate commitment in Ngn3-high and Pre-endocrine cells where many genes start to be transcribed.

In [121]:
sc.pl.embedding(adata, basis="umap", color=["condition", "cell_type"])

Estimate RNA velocity¶

RNA velocity estimation can currently be tackled with three existing approaches:

• steady-state / deterministic model (using steady-state residuals)

• stochastic model (using second-order moments),

• dynamical model (using a likelihood-based framework).

The steady-state / deterministic model, as being used in velocyto, estimates velocities as follows: Under the assumption that transcriptional phases (induction and repression) last sufficiently long to reach a steady-state equilibrium (active and inactive), velocities are quantified as the deviation of the observed ratio from its steady-state ratio. The equilibrium mRNA levels are approximated with a linear regression on the presumed steady states in the lower and upper quantiles. This simplification makes two fundamental assumptions: a common splicing rate across genes and steady-state mRNA levels to be reflected in the data. It can lead to errors in velocity estimates and cellular states as the assumptions are often violated, in particular when a population comprises multiple heterogeneous subpopulation dynamics.

The stochastic model aims to better capture the steady states. By treating transcription, splicing and degradation as probabilistic events, the resulting Markov process is approximated by moment equations. By including secondorder moments, it exploits not only the balance of unspliced to spliced mRNA levels but also their covariation. It has been demonstrated on the endocrine pancreas that stochasticity adds valuable information, overall yielding higher consistency than the deterministic model, while remaining as efficient in computation time.

The dynamical model (most powerful while computationally most expensive) solves the full dynamics of splicing kinetics for each gene. It thereby adapts RNA velocity to widely varying specifications such as non-stationary populations, as does not rely on the restrictions of a common splicing rate or steady states to be sampled. The splicing dynamics

𝑑𝑢(𝑡)/𝑑𝑡 = 𝛼𝑘(𝑡) − 𝛽𝑢(𝑡), (4.1)

𝑑𝑠(𝑡)/𝑑𝑡 = 𝛽𝑢(𝑡) − 𝛾𝑠(4.2) (𝑡),

is solved in a likelihood-based expectation-maximization framework, by iteratively estimating the parameters of reaction rates and latent cell-specific variables, i.e. transcriptional state k and cell-internal latent time t. It thereby aims to learn the unspliced/spliced phase trajectory. Four transcriptional states are modeled to account for all possible configurations of gene activity: two dynamic transient states (induction and repression) and two steady states (active and inactive) potentially reached after each dynamic transition.

In [122]:
scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.normalize_per_cell(adata)
scv.pp.filter_genes_dispersion(adata, n_top_genes=2000)
scv.pp.log1p(adata)
Filtered out 119 genes that are detected 20 counts (shared).
WARNING: Did not normalize X as it looks processed already. To enforce normalization, set `enforce=True`.
Normalized count data: spliced, unspliced.
Skip filtering by dispersion since number of variables are less than `n_top_genes`.
In [123]:
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
Filtered out 8 genes that are detected 20 counts (shared).
WARNING: Did not normalize X as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize spliced as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize unspliced as it looks processed already. To enforce normalization, set `enforce=True`.
Skip filtering by dispersion since number of variables are less than `n_top_genes`.
WARNING: Did not modify X as it looks preprocessed already.
computing neighbors
    finished (0:00:05) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:03) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
In [ ]:
scv.tl.recover_dynamics(adata)
recovering dynamics (using 1/16 cores)
  0%|          | 0/576 [00:00<?, ?gene/s]
In [22]:
scv.tl.velocity(adata, mode='dynamical')
computing velocities
    finished (0:00:04) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
In [24]:
scv.tl.velocity_graph(adata)
computing velocity graph (using 1/16 cores)
  0%|          | 0/46110 [00:00<?, ?cells/s]
    finished (0:00:58) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
In [23]:
adata
Out[23]:
AnnData object with n_obs × n_vars = 46110 × 1102
    obs: 'batch', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'condition', 'cell_type', 'sample_batch', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'velocity_gamma', 'velocity_qreg_ratio', 'velocity_r2', 'velocity_genes'
    uns: 'batch_colors', 'cell_type_colors', 'condition_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'leiden_colors', 'leiden_sizes', 'log1p', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'umap', 'velocity_params'
    obsm: 'X_pca', 'X_umap', 'ora_estimate', 'ora_pvals'
    varm: 'PCs'
    layers: 'counts', 'ambiguous', 'matrix', 'spliced', 'unspliced', 'Ms', 'Mu', 'velocity', 'variance_velocity'
    obsp: 'connectivities', 'distances'

For dynamycal graph. scv.tl.recover_dynamics(adata)

scv.tl.velocity(adata, mode='dynamical') scv.tl.velocity_graph(adata)

Project the velocities¶

In [52]:
scv.pl.velocity_embedding_stream(adata, basis='umap', color=['condition','cell_type'], 
                                 legend_loc='right margin', ncols=1)
In [51]:
scv.pl.velocity_embedding(adata, arrow_length=3, arrow_size=2, dpi=120, color=['condition', 'cell_type'], 
                          legend_loc='right margin')
In [49]:
scv.pl.velocity_embedding_grid(adata, basis='umap', color=['condition', 'cell_type'], 
                          legend_loc='right margin', ncols=1)

Interprete the velocities¶

See the gif here to get an idea of how to interpret a spliced vs. unspliced phase portrait. Gene activity is orchestrated by transcriptional regulation. Transcriptional induction for a particular gene results in an increase of (newly transcribed) precursor unspliced mRNAs while, conversely, repression or absence of transcription results in a decrease of unspliced mRNAs. Spliced mRNAs is produced from unspliced mRNA and follows the same trend with a time lag. Time is a hidden/latent variable. Thus, the dynamics needs to be inferred from what is actually measured: spliced and unspliced mRNAs as displayed in the phase portrait.

In [30]:
adata
Out[30]:
AnnData object with n_obs × n_vars = 46110 × 1102
    obs: 'batch', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'condition', 'cell_type', 'sample_batch', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'velocity_gamma', 'velocity_qreg_ratio', 'velocity_r2', 'velocity_genes'
    uns: 'batch_colors', 'cell_type_colors', 'condition_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'leiden_colors', 'leiden_sizes', 'log1p', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'umap', 'velocity_params', 'velocity_graph', 'velocity_graph_neg'
    obsm: 'X_pca', 'X_umap', 'ora_estimate', 'ora_pvals', 'velocity_umap'
    varm: 'PCs'
    layers: 'counts', 'ambiguous', 'matrix', 'spliced', 'unspliced', 'Ms', 'Mu', 'velocity', 'variance_velocity'
    obsp: 'connectivities', 'distances'
In [44]:
adata.uns['rank_genes_groups']
Out[44]:
{'logfoldchanges': array([( 1.7952989 ,), ( 2.1080117 ,), ( 0.93538135,), ...,
        (-0.89968306,), (-1.0380716 ,), (-0.6599163 ,)],
       dtype=[('0', '<f4')]),
 'names': array([('HNRNPH1',), ('CTNNB1',), ('SET',), ..., ('MYL6',), ('SNRPB',),
        ('PFN1',)], dtype=[('0', 'O')]),
 'params': {'corr_method': 'benjamini-hochberg',
  'groupby': 'leiden',
  'method': 'wilcoxon',
  'reference': '1',
  'use_raw': True},
 'pvals': array([(0.,), (0.,), (0.,), ..., (0.,), (0.,), (0.,)],
       dtype=[('0', '<f8')]),
 'pvals_adj': array([(0.,), (0.,), (0.,), ..., (0.,), (0.,), (0.,)],
       dtype=[('0', '<f8')]),
 'scores': array([( 52.340935,), ( 51.936554,), ( 50.832672,), ..., (-49.375004,),
        (-51.56173 ,), (-53.219963,)], dtype=[('0', '<f4')])}
In [42]:
scv.pl.velocity(adata, ['HNRNPH1',  'CTNNB1', 'SET'])
/home/aruna/miniconda3/lib/python3.9/site-packages/scvelo/plotting/utils.py:869: MatplotlibDeprecationWarning: The draw_all function was deprecated in Matplotlib 3.6 and will be removed two minor releases later. Use fig.draw_without_rendering() instead.
  cb.draw_all()
In [29]:
#scv.pl.velocity(adata, gene_names)
In [30]:
#scv.pl.scatter(adata, gene_names)

Identify important genes¶

  • By Cell type
In [62]:
scv.tl.rank_velocity_genes(adata, groupby='cell_type', min_corr=.3)

df = scv.DataFrame(adata.uns['rank_velocity_genes']['names'])
df.head()
ranking velocity genes
    finished (0:00:03) --> added 
    'rank_velocity_genes', sorted scores by group ids (adata.uns) 
    'spearmans_score', spearmans correlation scores (adata.var)
/tmp/ipykernel_8276/2552326810.py:3: DeprecationWarning: `scvelo.read_load.get_df` is deprecated since scVelo v0.2.4 and will be removed in a future version. Please use `scvelo.core.get_df` instead.
  df = scv.DataFrame(adata.uns['rank_velocity_genes']['names'])
Out[62]:
B cells Basophils Cholangiocytes Ciliated cells Eosinophils Gamma delta T cells Ionocytes Keratinocytes Merkel cells Mesangial cells Monocytes Pancreatic progenitor cells Parathyroid chief cells
0 PLAU LBH CPLX2 CALR AFAP1 SLC7A5 TPM1 FOSL1 CYP24A1 TPM1 OAS1 AKR1C3 SGCD
1 FOSL1 USP9Y CA12 EIF2S3 B2M HSPG2 GALNT10 MOK ST6GAL2 IGFBP7 MX1 AKR1C2 RBFOX3
2 ETS2 IL15 FGFR1 LBR CA8 AREG HSPG2 COL4A6 CA12 PTPRE USP18 RPS4Y1 FGFR1
3 CNTN1 KLF7 AKR1C3 DDIT4 JAK2 MOK MYLK HMGA2 AKR1C3 MORC4 RNF213 FLNB PCDH9
4 CREB5 COL5A2 NTRK3 TNIP1 ARHGAP26 AURKA LMCD1 KCNMA1 DHRS3 PCDH9 MX2 PTPRJ CNTN1
In [34]:
scv.pl.velocity(adata, ['OAS1',  'SLC7A5', 'PLAU', 'AFAP1'])
  • By Condition
In [53]:
scv.tl.rank_velocity_genes(adata, groupby='condition', min_corr=.3)

df = scv.DataFrame(adata.uns['rank_velocity_genes']['names'])
df.head()
ranking velocity genes
/home/aruna/miniconda3/lib/python3.9/site-packages/scvelo/tools/utils.py:501: DeprecationWarning: Please use `rankdata` from the `scipy.stats` namespace, the `scipy.stats.stats` namespace is deprecated.
  from scipy.stats.stats import rankdata
    finished (0:00:07) --> added 
    'rank_velocity_genes', sorted scores by group ids (adata.uns) 
    'spearmans_score', spearmans correlation scores (adata.var)
/tmp/ipykernel_8276/1080661622.py:3: DeprecationWarning: `scvelo.read_load.get_df` is deprecated since scVelo v0.2.4 and will be removed in a future version. Please use `scvelo.core.get_df` instead.
  df = scv.DataFrame(adata.uns['rank_velocity_genes']['names'])
Out[53]:
Control Trait
0 GALNT10 RNF213
1 SLC7A5 SP100
2 CNTN1 MX1
3 CA12 IFI35
4 DHRS3 IFI6
In [59]:
# Plot gene markers for control 
scv.pl.velocity(adata, ['GALNT10',  'SLC7A5', 'CNTN1', 'CA12', 'DHRS3'])
In [60]:
# Plot gene markers for Traited
scv.pl.velocity(adata, ['RNF213',  'SP100', 'MX1', 'IFI35', 'IFI6'])

Speed and coherence¶

Two more useful stats: - The speed or rate of differentiation is given by the length of the velocity vector. - The coherence of the vector field (i.e., how a velocity vector correlates with its neighboring velocities) provides a measure of confidence.

In [66]:
scv.tl.velocity_confidence(adata)
keys = 'velocity_length', 'velocity_confidence'
scv.pl.scatter(adata, c=keys, cmap='coolwarm', perc=[5, 95])
--> added 'velocity_length' (adata.obs)
--> added 'velocity_confidence' (adata.obs)
--> added 'velocity_confidence_transition' (adata.obs)
/home/aruna/miniconda3/lib/python3.9/site-packages/scvelo/plotting/utils.py:869: MatplotlibDeprecationWarning: The draw_all function was deprecated in Matplotlib 3.6 and will be removed two minor releases later. Use fig.draw_without_rendering() instead.
  cb.draw_all()
In [67]:
df = adata.obs.groupby('cell_type')[keys].mean().T
df.style.background_gradient(cmap='coolwarm', axis=1)
Out[67]:
cell_type B cells Basophils Cholangiocytes Ciliated cells Eosinophils Gamma delta T cells Ionocytes Keratinocytes Merkel cells Mesangial cells Monocytes Pancreatic progenitor cells Parathyroid chief cells
velocity_length 5.904355 5.867963 5.831227 6.215628 9.850060 5.139949 5.803107 5.243363 6.022109 6.246026 7.276476 5.580410 5.759519
velocity_confidence 0.781378 0.721726 0.791648 0.705098 0.779553 0.735193 0.770925 0.731734 0.785222 0.786003 0.800558 0.715649 0.777177
In [68]:
df = adata.obs.groupby('condition')[keys].mean().T
df.style.background_gradient(cmap='coolwarm', axis=1)
Out[68]:
condition Control Trait
velocity_length 5.708857 6.792504
velocity_confidence 0.759915 0.760844
In [82]:
scv.pl.velocity_graph?

Velocity graph and pseudotime¶

We can visualize the velocity graph to portray all velocity-inferred cell-to-cell connections/transitions. It can be confined to high-probability transitions by setting a threshold.

In [81]:
scv.pl.velocity_graph(adata, threshold=.5, color='cell_type', 
                          legend_loc='right margin', ncols=1)
In [88]:
x, y = scv.utils.get_cell_transitions(adata, basis='umap', starting_cell=70)
ax = scv.pl.velocity_graph(adata, c='lightgrey', edge_width=.05, show=False)
ax = scv.pl.scatter(adata, x=x, y=y, s=120, c='ascending', cmap='gnuplot', ax=ax)
/home/aruna/miniconda3/lib/python3.9/site-packages/scvelo/plotting/utils.py:869: MatplotlibDeprecationWarning: The draw_all function was deprecated in Matplotlib 3.6 and will be removed two minor releases later. Use fig.draw_without_rendering() instead.
  cb.draw_all()

Finally, based on the velocity graph, a velocity pseudotime can be computed. After inferring a distribution over root cells from the graph, it measures the average number of steps it takes to reach a cell after walking along the graph starting from the root cells.

In [75]:
scv.tl.velocity_pseudotime(adata)
scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')
computing terminal states
    identified 10 regions of root cells and 1 region of end points .
    finished (0:00:08) --> added
    'root_cells', root cells of Markov diffusion process (adata.obs)
    'end_points', end points of Markov diffusion process (adata.obs)
/home/aruna/miniconda3/lib/python3.9/site-packages/scvelo/plotting/utils.py:869: MatplotlibDeprecationWarning: The draw_all function was deprecated in Matplotlib 3.6 and will be removed two minor releases later. Use fig.draw_without_rendering() instead.
  cb.draw_all()

PAGA velocity graph¶

PAGA graph abstraction has benchmarked as top-performing method for trajectory inference. It provides a graph-like map of the data topology with weighted edges corresponding to the connectivity between two clusters. Here, PAGA is extended by velocity-inferred directionality.

In [76]:
# PAGA requires to install igraph, if not done yet.
!pip install python-igraph --upgrade --quiet
In [77]:
# this is needed due to a current bug - bugfix is coming soon.
adata.uns['neighbors']['distances'] = adata.obsp['distances']
adata.uns['neighbors']['connectivities'] = adata.obsp['connectivities']

scv.tl.paga(adata, groups='cell_type')
df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T
df.style.background_gradient(cmap='Blues').format('{:.2g}')
running PAGA using priors: ['velocity_pseudotime']
    finished (0:00:08) --> added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns)
    'paga/transitions_confidence', velocity transitions (adata.uns)
Out[77]:
B cells Basophils Cholangiocytes Ciliated cells Eosinophils Gamma delta T cells Ionocytes Keratinocytes Merkel cells Mesangial cells Monocytes Pancreatic progenitor cells Parathyroid chief cells
B cells 0 0 0 0 0 0 0.63 1 0 0.067 0 0 0.39
Basophils 0 0 0 0 0 0 0 0 0 0 0 0.3 0
Cholangiocytes 0 0 0 0 0 0 0 0 0 0 0 0 0
Ciliated cells 0 0 0.21 0 0 0.31 0 0 0.77 0 0 0.46 0
Eosinophils 0 0 0 0 0 0 0 0 0 0 0 0 0
Gamma delta T cells 0 0 0 0 0 0 0 0 0 0 0 0 0
Ionocytes 0 0 0 0 0 0 0 0 0 0 0 0 0
Keratinocytes 0 0 0 0 0 0.47 0 0 0 0 0 0 0
Merkel cells 0 0 0 0 0 0 0 0 0 0 0 0 0
Mesangial cells 0 0 0 0 0 0 0 0 0 0 0 0 0
Monocytes 0 0 0 0.089 0.091 0 0 0 0 0 0 0 0
Pancreatic progenitor cells 0 0 0 0 0 0 0 0 0 0 0 0 0
Parathyroid chief cells 0 0 0 0 0 0 0 0 0 0 0 0 0

This reads from left/row to right/column, thus e.g. assigning a confident transition from Ductal to Ngn3 low EP.

This table can be summarized by a directed graph superimposed onto the UMAP embedding.

In [78]:
scv.pl.velocity_embedding_stream(adata, basis='umap', color='cell_type', 
                                 legend_loc='right margin', ncols=1)
In [79]:
scv.pl.paga(adata, basis='umap', size=50, alpha=.1,
            min_edge_width=2, node_size_scale=1.5)
WARNING: Invalid color key. Using grey instead.
In [ ]:
results_file = '/home/aruna/Desktop/Analysis/Project1/Data/Project1&Velocity.h5ad'  
adata.write(results_file)
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: